30 research outputs found
Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning
Transfer learning leverages feature representations of deep neural networks
(DNNs) pretrained on source tasks with rich data to empower effective
finetuning on downstream tasks. However, the pretrained models are often
prohibitively large for delivering generalizable representations, which limits
their deployment on edge devices with constrained resources. To close this gap,
we propose a new transfer learning pipeline, which leverages our finding that
robust tickets can transfer better, i.e., subnetworks drawn with properly
induced adversarial robustness can win better transferability over vanilla
lottery ticket subnetworks. Extensive experiments and ablation studies validate
that our proposed transfer learning pipeline can achieve enhanced
accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity
patterns, further enriching the lottery ticket hypothesis.Comment: Accepted by DAC 202
NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions
Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators
While maximizing deep neural networks' (DNNs') acceleration efficiency
requires a joint search/design of three different yet highly coupled aspects,
including the networks, bitwidths, and accelerators, the challenges associated
with such a joint search have not yet been fully understood and addressed. The
key challenges include (1) the dilemma of whether to explode the memory
consumption due to the huge joint space or achieve sub-optimal designs, (2) the
discrete nature of the accelerator design space that is coupled yet different
from that of the networks and bitwidths, and (3) the chicken and egg problem
associated with network-accelerator co-search, i.e., co-search requires
operation-wise hardware cost, which is lacking during search as the optimal
accelerator depending on the whole network is still unknown during search. To
tackle these daunting challenges towards optimal and fast development of DNN
accelerators, we propose a framework dubbed Auto-NBA to enable jointly
searching for the Networks, Bitwidths, and Accelerators, by efficiently
localizing the optimal design within the huge joint design space for each
target dataset and acceleration specification. Our Auto-NBA integrates a
heterogeneous sampling strategy to achieve unbiased search with constant memory
consumption, and a novel joint-search pipeline equipped with a generic
differentiable accelerator search engine. Extensive experiments and ablation
studies validate that both Auto-NBA generated networks and accelerators
consistently outperform state-of-the-art designs (including
co-search/exploration techniques, hardware-aware NAS methods, and DNN
accelerators), in terms of search time, task accuracy, and accelerator
efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.Comment: Accepted at ICML 202
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Boosting the task accuracy of tiny neural networks (TNNs) has become a
fundamental challenge for enabling the deployments of TNNs on edge devices
which are constrained by strict limitations in terms of memory, computation,
bandwidth, and power supply. To this end, we propose a framework called
NetDistiller to boost the achievable accuracy of TNNs by treating them as
sub-networks of a weight-sharing teacher constructed by expanding the number of
channels of the TNN. Specifically, the target TNN model is jointly trained with
the weight-sharing teacher model via (1) gradient surgery to tackle the
gradient conflicts between them and (2) uncertainty-aware distillation to
mitigate the overfitting of the teacher model. Extensive experiments across
diverse tasks validate NetDistiller's effectiveness in boosting TNNs'
achievable accuracy over state-of-the-art methods. Our code is available at
https://github.com/GATECH-EIC/NetDistiller
Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
While increasingly deep networks are still in general desired for achieving
state-of-the-art performance, for many specific inputs a simpler network might
already suffice. Existing works exploited this observation by learning to skip
convolutional layers in an input-dependent manner. However, we argue their
binary decision scheme, i.e., either fully executing or completely bypassing
one layer for a specific input, can be enhanced by introducing finer-grained,
"softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS)
framework. The core idea of DFS is to hypothesize layer-wise quantization (to
different bitwidths) as intermediate "soft" choices to be made between fully
utilizing and skipping a layer. For each input, DFS dynamically assigns a
bitwidth to both weights and activations of each layer, where fully executing
and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero
bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive
power during input-adaptive inference, enabling finer-grained
accuracy-computational cost trade-offs. It presents a unified view to link
input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive
experimental results demonstrate the superior tradeoff between computational
cost and model expressive power (accuracy) achieved by DFS. More visualizations
also indicate a smooth and consistent transition in the DFS behaviors,
especially the learned choices between layer skipping and different
quantizations when the total computational budgets vary, validating our
hypothesis that layer quantization could be viewed as intermediate variants of
layer skipping. Our source code and supplementary material are available at
\link{https://github.com/Torment123/DFS}
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs' limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools.Comment: Accepted by ICCAD 202